DESeq 2 Differential expression analysis

Using DESeq2, each contrast was investigated for differential expression. The DESeq2 model assumes a negative bionomial distribution of the read counts.

The DESeq2 analysis approach is described here.

The table below lists the top Further we use the DEseq2 rlog transformation to normalize read counts for plotting and downstream analysis. A significance level p-value of 0.05 is used. Additonally only genes with a Log fold change higher than 0.5 are deemed significant.

All resulting plots are also exported as a svg file.

MA plots

showing the results of the MAPlot depending on if logFoldShrinkage has been performed No LFC Shrinkage

Heatmap of Differentially Expressed genes

The following heatmap shows the top differentially expressed genes as defined above. Genes are clustered, however the columns are not. The expression values are DESeq2 rlog normalized expression values.

Here you can identify clustered of differntially expressed genes to a certain degree.

The Heatmap shows the rlog transformed Count values from deseq seq which are centered to the Mean for each row.

Volcanoplot

The following plot shows the top differential expressed genes in both LogFold change and p-value.

Top over and underexpressed genes

differentially expressed genes are defined as genes with a padj < 0.05 and an abs(logFoldChange) > 0.5 are plotted into a heatmap again. However, to better contrast the two tested groups, the other groups have been removed from the following heatmaps. Further, one heatmap each is generated for the over expressed genes for each of the two sample groups. The first two heatmaps show the total gene expression normalized using the DESeq2 rlog approach.

The Red and blue heatmaps show the gene expression normalized by the mean expression per gene in each column, to better display the changes in gene expression between the two groups.

Gene Set Enrichment

While a list of gene names is of interest for our analysis, gene set enrichment analysis (GSEA) provides a way to discover biological pathways, associated to the differentially expressed genes found here. Here, we use Gene set Enrichment analysis against: - MsigDB database - kegg Pathway database - Reactome database

We also use an over representation test on the MsigDB pathway database as a comparison to GSEA, which is in itself the more robust technique.

Gene set enrichment plots

The two following plots show plots for GSEA for KEGG and Reactome enrichments The Upsetplot visualizes the fold change distribution for the enriched terms (barplot) The heatmap like plot colors each genes enrichment in the various pathways by their (shrunken) log-fold changes.

In the first plot, a heatmap of the LogfoldChanges of enriched genes over the different pathyways is show, giving a quick overview where genes are enriched in.

The second plot shows an upset plot of overlapping genes, showing if the few significant genes are causing many enrichments to pop up.

The last plot shown is a network plot associating enriched pathways to genes, better highlighting single genes active across multiple enrichment nodes.

Msigdb enrichments are tested both using gsea and over representation tests. GSEA are statistically more robust, however since over representation test would give more results, we also give this method a try.

Note: - GeneRatio denotes the % of genes found in the designated gene set. (for over rep test - for GSEA: GeneRatio denotes \(\frac{Core\ enriched\ genes}{Set\ Size} *100\)

Hallmark_gsea

Hallmark_fisher

MsigDB_C2

MsigDB_C5

MsigDB_C8

MsigDB_C3

Reactome_GSEA